feat: refine hpc_tuning and add additional tunings#70
feat: refine hpc_tuning and add additional tunings#70lixuemin2016 wants to merge 1 commit intolinux-system-roles:mainfrom
Conversation
Reviewer's GuideRefines the HPC tuning Ansible role by tightening resource limits, managing sunrpc settings via a dedicated systemd service, and adding safeguards like removing azsec-monitor and re-execing/reloading systemd when configuration changes occur. Sequence diagram for sunrpc_tcp_settings systemd-based tuningsequenceDiagram
participant Ansible
participant Systemd
participant SunrpcService as sunrpc_tcp_settings.service
participant Script as sunrpc_tcp_settings.sh
participant Kernel as sunrpc_module
Ansible->>Systemd: Install sunrpc_tcp_settings.service
Ansible->>Systemd: daemon-reload
Ansible->>Systemd: Enable sunrpc_tcp_settings
Ansible->>Systemd: Start sunrpc_tcp_settings
Systemd->>SunrpcService: Activate service
SunrpcService->>Script: ExecStart /usr/sbin/sunrpc_tcp_settings.sh
Script->>Kernel: modprobe sunrpc
Script->>Systemd: sysctl -p
SunrpcService-->>Systemd: Exit status 0
Ansible->>Systemd: systemctl is-active sunrpc_tcp_settings
Systemd-->>Ansible: active
Sequence diagram for updating systemd MEMLOCK limitssequenceDiagram
participant Ansible
participant Systemd
participant SystemConf as system.conf
participant UserConf as user.conf
Ansible->>SystemConf: Ensure DefaultLimitMEMLOCK=infinity
Ansible->>UserConf: Ensure DefaultLimitMEMLOCK=infinity
Ansible->>Systemd: daemon-reexec
Systemd-->>Ansible: New limits applied for services
Flow diagram for HPC tuning Ansible task blockflowchart TD
A["Start hpc_tuning block"] --> B["Remove azsec-monitor package if present"]
B --> C["Write 90-hpc-limits.conf with updated nofile limits"]
C --> D["Set DefaultLimitMEMLOCK=infinity in system.conf"]
D --> E["Set DefaultLimitMEMLOCK=infinity in user.conf"]
E --> F["Write 90-hpc-sysctl.conf and reload sysctl"]
F --> G["Deploy sunrpc_tcp_settings.sh to /usr/sbin"]
G --> H["Deploy sunrpc_tcp_settings.service to /etc/systemd/system"]
H --> I["systemctl daemon-reload"]
I --> J["Enable and start sunrpc_tcp_settings service"]
J --> K["Verify sunrpc_tcp_settings is active"]
K --> L["End hpc_tuning block"]
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The
lineinfiletasks forDefaultLimitMEMLOCK=infinitywill not update an existingDefaultLimitMEMLOCKentry with a different value; consider using aregexpto replace any existing line rather than just appending a new one. - The
sunrpc_tcp_settings.shscript runssysctl -p, which reloads all sysctl settings and might have unintended side effects; it would be safer to scope this to the specific sysctl config file(s) you manage (e.g.,sysctl -p /etc/sysctl.d/90-hpc-sysctl.conf) or usesysctl -wfor the needed keys.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `lineinfile` tasks for `DefaultLimitMEMLOCK=infinity` will not update an existing `DefaultLimitMEMLOCK` entry with a different value; consider using a `regexp` to replace any existing line rather than just appending a new one.
- The `sunrpc_tcp_settings.sh` script runs `sysctl -p`, which reloads all sysctl settings and might have unintended side effects; it would be safer to scope this to the specific sysctl config file(s) you manage (e.g., `sysctl -p /etc/sysctl.d/90-hpc-sysctl.conf`) or use `sysctl -w` for the needed keys.
## Individual Comments
### Comment 1
<location> `templates/sunrpc_tcp_settings.sh:6` </location>
<code_context>
+{{ "system_role:hpc" | comment(prefix="", postfix="") }}
+
+modprobe sunrpc
+sysctl -p
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Using `sysctl -p` without a specific file may reload and depend on global sysctl configuration
This reloads `/etc/sysctl.conf` and all of `/etc/sysctl.d`, which can introduce unrelated changes and fail due to invalid, non-HPC sysctl entries. To limit scope and improve robustness, invoke `sysctl -p` with the specific HPC (or sunrpc-specific) configuration file instead.
Suggested implementation:
```
sysctl -p {{ sunrpc_sysctl_config | default('/etc/sysctl.d/99-hpc-sunrpc-tcp.conf') }}
```
1. Ensure a matching sysctl configuration file (e.g. `/etc/sysctl.d/99-hpc-sunrpc-tcp.conf`) is created/managed by this role, containing only the sunrpc/HPC-specific sysctl settings.
2. Optionally define the `sunrpc_sysctl_config` variable in the role defaults or inventory if a different path is desired.
3. Confirm that the managed sysctl file is written before this script is executed so that `sysctl -p` can successfully load it.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
templates/sunrpc_tcp_settings.sh
Outdated
| {{ "system_role:hpc" | comment(prefix="", postfix="") }} | ||
|
|
||
| modprobe sunrpc | ||
| sysctl -p |
There was a problem hiding this comment.
suggestion (bug_risk): Using sysctl -p without a specific file may reload and depend on global sysctl configuration
This reloads /etc/sysctl.conf and all of /etc/sysctl.d, which can introduce unrelated changes and fail due to invalid, non-HPC sysctl entries. To limit scope and improve robustness, invoke sysctl -p with the specific HPC (or sunrpc-specific) configuration file instead.
Suggested implementation:
sysctl -p {{ sunrpc_sysctl_config | default('/etc/sysctl.d/99-hpc-sunrpc-tcp.conf') }}
- Ensure a matching sysctl configuration file (e.g.
/etc/sysctl.d/99-hpc-sunrpc-tcp.conf) is created/managed by this role, containing only the sunrpc/HPC-specific sysctl settings. - Optionally define the
sunrpc_sysctl_configvariable in the role defaults or inventory if a different path is desired. - Confirm that the managed sysctl file is written before this script is executed so that
sysctl -pcan successfully load it.
|
It looks like the sunrpc service does two things:
Note that you do not need a service to do this - this can be done automatically by the os.
Is there some reason you need a custom systemd service to do this? |
tasks/main.yml
Outdated
| - name: Create sunrpc_tcp_settings.sh script | ||
| template: | ||
| src: sunrpc_tcp_settings.sh | ||
| dest: /usr/sbin/sunrpc_tcp_settings.sh |
There was a problem hiding this comment.
Technically the name should be "Install the sunrpc_tcp_settings.sh script" - we are not creating it...
Secondly, custom scripts like this go in /opt/hpc/azure/bin, aka "{{ __hpc_azure_resource_dir }}/bin", not in system binary directories.
templates/sunrpc_tcp_settings.sh
Outdated
| {{ ansible_managed | comment }} | ||
| {{ "system_role:hpc" | comment(prefix="", postfix="") }} | ||
|
|
||
| modprobe sunrpc |
There was a problem hiding this comment.
Also, this should fail if the kernel module loading fails. That way the failure gets propagated to teh service that invoked the script. However, I think the '-eu' addition to the shebang line I mentioned above will trigger script failure correctly here.
There was a problem hiding this comment.
@dgchinner Got it, this shell file is no need anymore since I keep using old installation method for consistency and old installation method is easier to maintain.
tasks/main.yml
Outdated
| notify: Restart systemd-modules-load | ||
| notify: Reload systemd |
There was a problem hiding this comment.
I think this removes the only user of the systemd-modules-load handler. If it is now unused, please remove it from the handlers/main.yml file.
Hi @richm @dgchinner thank you so much for your review. I use
Hi @richm @dgchinner thank you so much for review and detailed comment. Using a custom systemd service mainly is refer to the https://github.com/Azure/azhpc-images/blob/master/components/hpc-tuning.sh which uses custom systemd service. I will test sunrpc kernel module method and change back if no issue. |
Refine existing hpc_tuning and add additional system-level tunings - Remove azsec-monitor to prevent CPU utilization by azsec-monitor - Add DefaultLimitMEMLOCK configuration for systemd Check result: - sysctl sunrpc.tcp_max_slot_table_entries sunrpc.tcp_max_slot_table_entries = 128 - systemctl show --property=DefaultLimitMEMLOCK DefaultLimitMEMLOCK=infinity JIRA: RHELHPC-109 Signed-off-by: Xuemin Li <xuli@redhat.com>
@richm Thank you so much. Updated, keep using original method to install kernel module sunrpc. |
Enhancement:
Refine existing hpc_tuning and add additional system-level tunings
Reason:
Add more system-level tunings and refine current hpc_tuning
Note: It does not modify the main file /etc/systemd/system.conf, using
drop-in /etc/systemd/system.conf.d/99-memlock.conf which is safer and cleaner
Result:
Issue Tracker Tickets (Jira or BZ if any):
JIRA: RHELHPC-109
Some history information about customized systemd service method for reference only.
cat /etc/systemd/system.conf | grep -i DefaultLimitMEMLOCK=infinity
DefaultLimitMEMLOCK=infinity
cat /etc/systemd/user.conf | grep -i DefaultLimitMEMLOCK=infinity
DefaultLimitMEMLOCK=infinity
systemctl status sunrpc_tcp_settings
● sunrpc_tcp_settings.service - Set sunrpc tcp settings
Loaded: loaded (/etc/systemd/system/sunrpc_tcp_settings.service; enabled; preset: disabled)
Active: active (exited) since Mon 2026-02-09 07:39:18 UTC; 38min ago
Main PID: 19764 (code=exited, status=0/SUCCESS)
CPU: 5ms
systemd[1]: Starting Set sunrpc tcp settings...
systemd[1]: Finished Set sunrpc tcp settings.
systemctl restart sunrpc_tcp_settings
systemctl status sunrpc_tcp_settings
● sunrpc_tcp_settings.service - Set sunrpc tcp settings
Loaded: loaded (/etc/systemd/system/sunrpc_tcp_settings.service; enabled; preset: disabled)
Active: active (exited) since Mon 2026-02-09 08:18:16 UTC; 3s ago
Process: 35183 ExecStart=/usr/sbin/sunrpc_tcp_settings.sh (code=exited, status=0/SUCCESS)
Main PID: 35183 (code=exited, status=0/SUCCESS)
CPU: 5ms
systemd[1]: Starting Set sunrpc tcp settings...
systemd[1]: Finished Set sunrpc tcp settings.
Summary by Sourcery
Refine HPC system tuning by expanding system-level resource limits, managing azsec-monitor, and replacing direct sunrpc module loading with a dedicated systemd service.
New Features:
Enhancements:
Documentation: